Take Home Assignment Part 1

by JUAN DAVID ACOSTA TORRES

INTRODUCTION

Over recent years, rapid technological developments have enabled bussinesses, from both the low and high end of the budget spectrum, to 'go online'. Now, E-platforms are ubiquitous and have profoundly reshaped the way the 'bussiness' and the 'customer' interact. This expansion has not come alone, however, and has prompted the surge of what has been called the 'sharing economy'. Now, this economic model has spread to virtually all aspects in our life, from transportation and education to food and clothing. Housing, off course, is not an exception.

Among the giants that are part of this new model one can easily find Airbnb, a vacation home rental platform (in principle) that connects potential 'hosts' and 'guests' in order "to create a world where anyone can belong anywhere". (https://news.airbnb.com/airbnb-2019-business-update/)

Now, as Airbnb has become the hub not only for individuals seeking short-term solutions, but also for more specialized real-estate agencies, the company has found opposition from governments and residents alike, from Tokio to San Diego, the place where it all started.

Given its position in the market (in-between long-term housing and hotels), the main problems that the company have faced can be summarized as: (I) Having a detrimental impact on the long-term housing stock, as it may become more profitable to rent a unit through the platform than to put it on the regular market, and (2) having a detrimental impact on the performance of the hotel industry.

In this sense, the objective of this exercise was to perform a preliminary exploratory analysis of AirBnB in San Diego, both of the general statistics describing the overall characteristics of the listings as well as of more specific spatial relations between the listings and the housing market, particularly, wether listings on Airbnb are aimed at tourists or medium and long-term residents.

MATERIALS

Preparation of the Data

  1. The original file has the values from the column 'price' stored as text elements
  1. The symbol '$' is removed
  1. Then, the commas are removed
  1. Lastly, all the values are converted into float type

First of all, just as a visual guide, a histogram is used to verify the main values over which the price per night ranges. Then, the mean and the standard deviation of all the prices is calculated. It was assumed that values above the mean + 2std were too extreme, and thus, were not considered in the analysis.

It has to be noted that this is not an exact way to remove outliers in the data, first because those same outliers were used to calculate the mean and std, and second, because some of the removed values represent a 'true observation'. However, in order to make visualizations more clear, they were removed.

The new histogram clearly shows that the large majority of homes/apartments/rooms listed on Airbnb have a price per night between around 50 and 150 dollars.

A query was used not only to remove the prices that were considered to extreme, but also to take into account only those listings that had at least one review in the last year (ltm-last twelve months). With this, the table to be analyzed is obtained.

It is worth mentioning that obtain many of the results multiple assumptions (with no real data to back them up) were made. Therefore, the interpretations that are made could be significantly flawed.

RESULTS

Multiple aspects about the general characteristics of the listings were analyzed. To this end, a number of questions was used to make the process more explicit. These questions, the code used to find their answers, as well as a brief interpretation of the result, follow:

Number of Listings

After dataset was "cleaned up", the number of listings went from 8688 to 6506. As it could be previously seen, the number of listings removed from the table on the basis of price was not high (186). Therefore, it is clear that the drop was caused by the number of inactive listings.

Listings per host

The name of the first, second and fourth 'hosts' are "710 Vacation", "SeaBreeze" and "WanderJaunt", all vacation property rental companies. The name of the third and fifth 'hosts' is John and Chloe, however, given the number of listings to their name, it is highly unlikely that they are truly private individuals.

Price

Of the 7 neighbourhoods with the most listings (overall), two of them ('Mission Bay' and 'La Jolla') also seem to have the listings with the highest prices

Basing the answer in the highest 10 average prices for each table, it could be said that yes, there is a significant difference between the average prices offered by the two groups

Tourism or Residents?

According to the tables above: (1) the number of listings being rented to 'tourists' dramatically surpasses the number of those going to potential medium/long term residents, (2) as expected, the same neighbourhoods with the largest number of listings (overall) also have the largest number of listings rented to 'tourists' (in fact, it is almost the same) and (3) entire homes/apartments are the predominant 'room type' being rented out.

Bookings in 2020

Occupancy rate and Monthly income

Occupancy rate

The considerable majority of listings seems to have the maximum occupancy rate allowed by the function. It is probable that the review rate considered is lower in reality.

Income per month

These histograms show that the distribution (skewed) of the monthly income for individual listings is not even, with most values ranging between $50 and 2000.

Income per neighbourhood

This histogram shows the general distribution of monthly income (x-axis) when grouped by negihbourhood. It shows the presence of a couple of extreme values (with the neighbourhood all the way to the right being 'Mission Bay'). Since the histogram doesn't seem to be a good representation of the data when all the values are considered, some of them are filtered out (mean + std).

In the same way as before, the histogram shows that the montlhy income is not evenly distributed (in terms of values), with most incomes being clustered along the $500-15000 range. Despite this, and even though the extreme values were filtered out, there is still a relatively high number of neigbourhoods with high summed monthly incomes (>20000). The problem with this type of histogram is that it does not explicitly differentiate between neighbourhoods

While these histograms don't show anything particularly different than what has been described (the distribution of monthly income per neighbourhood is for the most part, skewed), they do show that for some neighbourhoods the distribution of monthly income (across the range of values) is more even, and would even appear to follow the shape of a bell curve

Income per host

This comparison partly shows how unevenly distributed the gains from Airbnb are. The gains for the top 10 "hosts", which in reality are probably private companies in the vacational rental market (or representatives in their name), are almost three times as high as those for the bottom 1000 hosts

When some of the monthly income values are filtered out (mean+std), just a relatively small number of 'hosts' is removed

In the same way as when the income per month was grouped by neighbourhood, the grouping by host also seems to be unequal, with most values being concentraded in the 50-2000 range.

Considerable source of income?

According to the calculation above, approximately 40% of the hosts could obtain a substantial portion of their income through Airbnb

Charts for occupancy rate and income per month

As previously shown with the histogram, this bar graph shows that the occupancy rates across all the neighbourhoods remains relatively high, with the majority falling in the range between 20 and 50%.

This bar graph also shows, now more clearly, how uneven the distibution of monthly income is when grouped by neighbourhood.

GRAPHS AND CONCLUSIONS

In terms of the distribution of listings, this map also confirms what had been previously found with the tables: most of them are located in the already mentioned neighbourhoods.

With some of the questions above, a pattern had already began to emerge: the distribution of listings and income is not evenly distributed across the city, at least for the majority of neighbourhoods (mostly due to the presence of what can be assumed are private agencies); despite this, the listings don't seem to be rented most of the time to medium or long term residents, but mostLy to tourists.

With this map, this pattern, or part of it, seems to explained, as some of the aforementioned neighbourhoods are located along the coast, a location where the touristic industry is expected to be strong.

Accordingly, given all of what has been explained, the majority of the gains from Airbnb in the city are clustered in the neighbourhoods located along the coast.

In general, the most important conclusion from this preliminary analysis is that despite some inequalities surround the Airbnb market, those inequalites should not have, at least theoretically based on all the assumption made, a significantly large impact on the long-term housing market